Proposal: Modification of Receive Split proposal. #4036

bwplotka · 2021-04-09T10:32:55Z

👋🏽 @Chans321 @brancz @GiedriusS @kakkoyun @squat @yashrsharma44 @metalmatze

Sorry for not paying attention to the past proposal, but when we revisited and tried to review #3845 and #3580 with @kakkoyun we found the potentially easier idea that might be more cleaner for users and maintainers. Feel free to give feedback!

First of all, thanks for all of your hard work on this, and experimental implementations in PR 🤗 We only want to propose different configuration really, but the idea stays exactly the same.

Some learnings for the future:

Let's not merge things prematurely. The previous proposal was missing some important parts:
- Goals were stating implementation goals, not the usage goals. We should focus on usage goals to avoid focusing too much on implementation and not seeing other options.
- Proposal was missing Alternatives.
- Motivation was motivating the implementation. I think it should rather motivate GOALS, not implementation yet.
- Motivation was not clear. Statements like Right now the receiver handles ingestion, routing and writing, thus leading to too many responsibilities in a single process. might need some proof.
- There were some left overs like -Additionally, we touch possibility for adding a consistent hashing mechanism and buffering the incoming requests. What that means?
- The related ticket was for something else - for different database reconfiguration @squat proposed. Given that there was not enough context for this proposal.
- Lack of migration guide for such breaking change of already stable receive.
Implementation PR led to confusions.

It was too large, ideally we can split work into smaller pieces. It was hard to review properly, agree on all items, not miss important piece etc.
It was marked as almost ready, which led @yashrsharma44 to spend time on rebasing and cleaning all of it. Instead, it was missing many elements like documentation, migration docs, mixings, dashboards, alerts that should be at least well planned ahead. If we would merge any of those two in this state it could led to even bigger confusion.

Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com

brancz · 2021-04-09T11:16:42Z

docs/proposals/202012_receive_split.md


-### Drawbacks of the project
 There is no possible way to have a single-process receiver. The user must have a router + a receiver running.


If this is the only con, then there are simple solutions to solve this like adding the http endpoint.

This is left overall, sorry, will be more explicit

I clarified, could you double check?

brancz · 2021-04-09T11:18:16Z

I don't see sufficient reason here to change the direction of this project entirely when it's so close to the finish line, and two maintainers having signed off on the previous architecture. Also there is a docs PR out already.

docs/proposals/202012_receive_split.md

kakkoyun

Looks great. With this new proposed implementation, it draws more clear migration path and it's backwards compatible. It would much easier to roll these out.

kakkoyun · 2021-04-09T11:13:59Z

docs/proposals/202012_receive_split.md

-    4. The new architecture enables further performance improvements.
+[@squat](https://github.com/squat):
+
+> Currently, any change to the hashring configuration file will trigger all Thanos Receive nodes to flush their multi-TSDBs, causing them to enter an unready state until the flush is complete. This unavailability during a flush allows for a clear state transition, however it can result in downtimes on the order of five minutes for every configuration change. Moreover, during configuration changes, the hashring goes through an even longer period of partial unreadiness, where some nodes begin and finish flushing before and after others. During this partial unreadiness, the hashring can expect high internal request failure rates, which cause clients to retry their requests, resulting in even higher load. Therefore, when the hashring configuration is changed due to automatic horizontal scaling of a set of Thanos Receivers, the system can expect higher than normal resource utilization, which can create a positive feedback loop that continuously scales the hashring.


kakkoyun · 2021-04-09T11:14:39Z

docs/proposals/202012_receive_split.md

+not specifying:
+
+```yaml
+  --receive.local-endpoint=RECEIVE.LOCAL-ENDPOINT


kakkoyun · 2021-04-09T12:20:57Z

docs/proposals/202012_receive_split.md

+In comparison to previous proposal (as mentioned in [alternatives](#previous-proposal-separate-receive-route-command) we have big adventages:
+
+1. We can reduce number of components in Thanos system, we can reuse similar component flags and documentation. Users has to learn about one less command and in result Thanos design is much more approachable. Less components mean less maintainance, code and other implicit duties: Separate changelogs, issue confusions, boilerplates, etc.
+2. Allow consistent pattern with Query. We don't have separate StorAPI component for proxying, we have that baked into Querier. This has been proven to be flexible and understandable, so I would like to propose similar pattern in Receive.


docs/proposals/202012_receive_split.md

bwplotka · 2021-04-09T12:40:55Z

I don't see sufficient reason here to change the direction of this project entirely when it's so close to the finish line, and two maintainers having signed off on the previous architecture. Also there is a docs PR out already.

I understand your @brancz feelings and it's entirely our bad we did not participate in this discussion, however. It's my fault. I would still love if you could take a look on 3 aspects:

change the direction of this project entirely

Is it entirely changing? From user there is no much different. Indeed it changing a lot of implementation wise - it avoids tons of work which is good?

when it's so close to the finish line

I would argue with this. I don't think it's close for proper usage by users. Also it's creating breaking changes which violates the stability guarantee we made with receive. Of course we are not in 1.0 but I would expect some migration plan for receive useage. I mentioned reasoning in #4036 (comment) point 2.

This proposal here allows to avoid tricky migration.

, and two maintainers having signed off on the previous architecture

What's the difference if 1, two or hundred of maintainers signed off if the alternative we propose here was not considered (at least in the view of previously written proposal)?

I explained the rationales in Proposal. The need of single-process receive and route you commented on is arguable and really low priority issue. The main issue is the user confusion of architectural design and each components purpose, boileplate of another command, inconsistency with Querier pattern. I would love to at least look and consider In my opinion easier solution.

kakkoyun · 2021-04-09T12:51:39Z

@brancz @bwplotka I also want to see this push through the finish line as quickly as possible. If could just tweak the proposal and modify the existing implementation, it'd be faster to merge this. We can even make this happen for v0.20.0 by pushing the cut of RC for a week (we just released a major version recently anyways). I'm happy to help with the effort.

Current implementation changes the interface of the existing commands drastically, there is no easy migration path using current implementation. Maybe we should acknowledge the proposed design and implementation drift apart.

FWIW, either we can proposal or implementation, I want to have these changes on the main branch. Initial benchmarks by @metalmatze proved that there are nice gains in here.

docs/proposals/202012_receive_split.md

yashrsharma44

Some nitpick and few questions from my side.
Otherwise LGTM 🥇

docs/proposals/202012_receive_split.md

brancz · 2021-04-12T07:49:52Z

I'm not necessarily arguing with the technological direction (I do agree this is a good, maybe better direction), but I find it worrisome that we let parts of the community work on something for a long time only to step in and "know it better" and turn over a lot of work someone has put into something. This is being everything else but welcoming to new contributors, and I've noticed this behavior more than once. Part of letting people grow is allowing them to make mistakes and not stepping in and having to be part of every conversation.

Clearly, there are enough maintainers to back this proposal up from a technical perspective, so I'm not going to block this proposal, but for the record, I want to state that I find this behavior damaging for the Thanos project as a whole. Not everyone can always be part of every conversation and we need to accept that, even if it means technological decisions may be made that an individual may not necessarily agree with.

bwplotka · 2021-04-12T19:59:42Z

Thanks, Frederic, this is important to all of us. As we discussed offline, there were lots of offline chats and even meeting between @kakkoyun, @Chans321 @yashrsharma44. The trigger for this proposal was actually done on pre-last Contributor Thanos Hours, which is also recorded and available here:

Topic: Thanos Contributor Office Hours
Date: Apr 1, 2021 11:00 AM Universal Time UTC

Meeting Recording:
https://zoom.us/rec/share/VoV2AVCrZYmvaPk27kSftXnTAtVEb-OQqkEiYVxfXNLFWfdSu_sNiAr7CL92888Z.seIHezcDVw2szx8f

Unfortunately, you as the mentor of @Chans321 were not updated on all of those communications, sorry for that!

It's true we should be welcoming, but it's also worth to mention we started this work in Community Bridge Q4 2020. Lot's of new ideas came after that I think are fair to consider.

Not everyone can always be part of every conversation and we need to accept that, even if it means technological decisions may be made that an individual may not necessarily agree with.

Yea, but I treat this proposal as an improvement over the past work, all reusing the previous work and foundations.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka

Updated, please PTAL again @yashrsharma44 @kakkoyun

Thanks for good review @yashrsharma44 !

docs/proposals/202012_receive_split.md

yashrsharma44

LGTM 🔥 , just small questions about wording of the statement 😁

docs/proposals/202012_receive_split.md

yashrsharma44

Resolved the issues after Office Hours, looks good to me!

Co-authored-by: Yash Sharma <yashrsharma44@gmail.com>

docs/proposals/202012_receive_split.md

GiedriusS

LGTM modulo that one grammar nit. It indeed sucks a lot that we have some initial implementations of the previous version but at the same time I feel that it won't be too much work to update them for this improved version as we are only changing how this new mode will be turned on, right?

Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>

bwplotka · 2021-04-27T14:18:49Z

Thank you all for feedback and reviews!

bwplotka requested review from kakkoyun, metalmatze, GiedriusS, brancz and squat April 9, 2021 10:32

This was referenced Apr 9, 2021

receiver: Split - Upstream the changes from the PR #3580 #3845

Closed

[WIP] receiver: Split #3580

Closed

brancz reviewed Apr 9, 2021

View reviewed changes

bwplotka commented Apr 9, 2021

View reviewed changes

docs/proposals/202012_receive_split.md Outdated Show resolved Hide resolved

docs/proposals/202012_receive_split.md Outdated Show resolved Hide resolved

kakkoyun previously approved these changes Apr 9, 2021

View reviewed changes

bwplotka dismissed kakkoyun’s stale review via f54bf6e April 9, 2021 12:31

bwplotka force-pushed the rule-split branch from 6e380ec to f54bf6e Compare April 9, 2021 12:31

bwplotka commented Apr 10, 2021

View reviewed changes

docs/proposals/202012_receive_split.md Outdated Show resolved Hide resolved

yashrsharma44 reviewed Apr 10, 2021

View reviewed changes

Proposal: Modification of Receive Split proposal.

55fdfb6

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka force-pushed the rule-split branch from f54bf6e to 55fdfb6 Compare April 21, 2021 08:34

bwplotka commented Apr 21, 2021

View reviewed changes

yashrsharma44 reviewed Apr 21, 2021

View reviewed changes

docs/proposals/202012_receive_split.md Outdated Show resolved Hide resolved

docs/proposals/202012_receive_split.md Show resolved Hide resolved

docs/proposals/202012_receive_split.md Show resolved Hide resolved

yashrsharma44 reviewed Apr 21, 2021

View reviewed changes

docs/proposals/202012_receive_split.md Show resolved Hide resolved

yashrsharma44 previously approved these changes Apr 22, 2021

View reviewed changes

Update docs/proposals/202012_receive_split.md

bb5d5f7

Co-authored-by: Yash Sharma <yashrsharma44@gmail.com>

bwplotka dismissed yashrsharma44’s stale review via bb5d5f7 April 26, 2021 19:21

GiedriusS reviewed Apr 26, 2021

View reviewed changes

docs/proposals/202012_receive_split.md Outdated Show resolved Hide resolved

GiedriusS previously approved these changes Apr 26, 2021

View reviewed changes

Update docs/proposals/202012_receive_split.md

ae670ff

Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>

bwplotka dismissed GiedriusS’s stale review via ae670ff April 27, 2021 05:53

bwplotka merged commit 45a684e into main Apr 27, 2021

bwplotka deleted the rule-split branch April 27, 2021 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Modification of Receive Split proposal. #4036

Proposal: Modification of Receive Split proposal. #4036

bwplotka commented Apr 9, 2021 •

edited

Loading

brancz Apr 9, 2021 •

edited

Loading

bwplotka Apr 9, 2021

bwplotka Apr 9, 2021

brancz commented Apr 9, 2021 •

edited

Loading

kakkoyun left a comment

kakkoyun Apr 9, 2021

kakkoyun Apr 9, 2021

kakkoyun Apr 9, 2021

bwplotka commented Apr 9, 2021

kakkoyun commented Apr 9, 2021

yashrsharma44 left a comment

brancz commented Apr 12, 2021 •

edited

Loading

bwplotka commented Apr 12, 2021

bwplotka left a comment

yashrsharma44 left a comment

yashrsharma44 left a comment

GiedriusS left a comment

bwplotka commented Apr 27, 2021


		### Drawbacks of the project
		There is no possible way to have a single-process receiver. The user must have a router + a receiver running.

Proposal: Modification of Receive Split proposal. #4036

Proposal: Modification of Receive Split proposal. #4036

Conversation

bwplotka commented Apr 9, 2021 • edited Loading

brancz Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

bwplotka Apr 9, 2021

Choose a reason for hiding this comment

bwplotka Apr 9, 2021

Choose a reason for hiding this comment

brancz commented Apr 9, 2021 • edited Loading

kakkoyun left a comment

Choose a reason for hiding this comment

kakkoyun Apr 9, 2021

Choose a reason for hiding this comment

kakkoyun Apr 9, 2021

Choose a reason for hiding this comment

kakkoyun Apr 9, 2021

Choose a reason for hiding this comment

bwplotka commented Apr 9, 2021

kakkoyun commented Apr 9, 2021

yashrsharma44 left a comment

Choose a reason for hiding this comment

brancz commented Apr 12, 2021 • edited Loading

bwplotka commented Apr 12, 2021

bwplotka left a comment

Choose a reason for hiding this comment

yashrsharma44 left a comment

Choose a reason for hiding this comment

yashrsharma44 left a comment

Choose a reason for hiding this comment

GiedriusS left a comment

Choose a reason for hiding this comment

bwplotka commented Apr 27, 2021

bwplotka commented Apr 9, 2021 •

edited

Loading

brancz Apr 9, 2021 •

edited

Loading

brancz commented Apr 9, 2021 •

edited

Loading

brancz commented Apr 12, 2021 •

edited

Loading